This report gives the result of running the computer algebra independent integration test.
The download section in the appendix contains links to download the problems in plain text format used for all CAS systems.
The number of integrals in this report is [ 314 ]. This is test number [ 59 ].
The following are the CAS systems tested:
Maxima and Fricas and Giac are called using Sagemath. This was done using Sagemath integrate command by changing the name of the algorithm to use the different CAS systems.
Sympy was called directly from Python.
Important note: A number of problems in this test suite have no antiderivative in closed form. This means the antiderivative of these integrals can not be expressed in terms of elementary, special functions or Hypergeometric2F1 functions. RootSum and RootOf are not allowed.
If a CAS returns the above integral unevaluated within the time limit, then the result is counted as passed and assigned an A grade.
However, if CAS times out, then it is assigned an F grade even if the integral is not integrable, as this implies CAS could not determine that the integral is not integrable in the time limit.
If a CAS returns an antiderivative to such an integral, it is assigned an A grade automatically and this special result is listed in the introduction section of each individual test report to make it easy to identify as this can be important result to investigate.
The results given in in the table below reflects the above.
System | % solved | % Failed |
Rubi | 100.00 ( 314 ) | 0.00 ( 0 ) |
Mathematica | 94.90 ( 298 ) | 5.10 ( 16 ) |
Maxima | 75.80 ( 238 ) | 24.20 ( 76 ) |
Fricas | 66.88 ( 210 ) | 33.12 ( 104 ) |
Mupad | 63.69 ( 200 ) | 36.31 ( 114 ) |
Maple | 61.46 ( 193 ) | 38.54 ( 121 ) |
Giac | 59.24 ( 186 ) | 40.76 ( 128 ) |
Sympy | 35.35 ( 111 ) | 64.65 ( 203 ) |
The table below gives additional break down of the grading of quality of the antiderivatives generated by each CAS. The grading is given using the letters A,B,C and F with A being the best quality. The grading is accomplished by comparing the antiderivative generated with the optimal antiderivatives included in the test suite. The following table describes the meaning of these grades.
grade |
description |
A |
Integral was solved and antiderivative is optimal in quality and leaf size. |
B |
Integral was solved and antiderivative is optimal in quality but leaf size is larger than twice the optimal antiderivatives leaf size. |
C |
Integral was solved and antiderivative is non-optimal in quality. This can be due to one or more of the following reasons
|
F |
Integral was not solved. Either the integral was returned unevaluated within the time limit, or it timed out, or CAS hanged or crashed or an exception was raised. |
Grading is implemented for all CAS systems. Based on the above, the following table summarizes the grading for this test suite.
System | % A grade | % B grade | % C grade | % F grade |
Rubi | 100.00 | 0.00 | 0.00 | 0.00 |
Mathematica | 80.25 | 7.01 | 7.64 | 5.10 |
Fricas | 45.86 | 21.02 | 0.00 | 33.12 |
Giac | 35.99 | 23.25 | 0.00 | 40.76 |
Maxima | 33.12 | 42.68 | 0.00 | 24.20 |
Maple | 27.07 | 24.20 | 10.19 | 38.54 |
Mupad | N/A | 40.76 | 0.00 | 36.31 |
Sympy | 17.52 | 17.83 | 0.00 | 64.65 |
The following is a Bar chart illustration of the data in the above table.
The figure below compares the CAS systems for each grade level.
The following table shows the distribution of the different types of failure for each CAS. There are 3 types of reasons why it can fail. The first is when CAS returns back the input within the time limit, which means it could not solve it. This the typical normal failure F .
The second is due to time out. CAS could not solve the integral within the 3 minutes time limit which is assigned F(-1).
The third is due to an exception generated. Assigned F(-2). This most likely indicates an interface problem between sagemath and the CAS (applicable only to FriCAS, Maxima and Giac) or it could be an indication of an internal error in CAS. This type of error requires more investigations to determine the cause.
System |
Number failed |
Percentage normal failure |
Percentage timeout failure |
Percentage exception failure |
Rubi |
0 |
0.00 % |
0.00 % |
0.00 % |
Mathematica |
16 |
100.00 % |
0.00 % |
0.00 % |
Maple | 121 | 100.00 % | 0.00 % | 0.00 % |
Fricas | 104 | 92.31 % | 7.69 % | 0.00 % |
Giac |
128 |
89.84 % |
8.59 % |
1.56 % |
Maxima |
76 |
100.00 % |
0.00 % |
0.00 % |
Sympy |
203 |
24.63 % |
61.08 % |
14.29 % |
Mupad |
114 |
100.00 % |
0.00 % |
0.00 % |
The table below summarizes the performance of each CAS system in terms of time used and leaf size of results.
Mean size is the average leaf size produced by the CAS (before any normalization). The Normalized mean is relative to the mean size of the optimal anti-derivative given in the input files.
For example, if CAS has Normalized mean of \(3\), then the mean size of its leaf size is 3 times as large as the mean size of the optimal leaf size.
Median size is value of leaf size where half the values are larger than this and half are smaller (before any normalization). i.e. The Middle value.
Similarly the Normalized median is relative to the median leaf size of the optimal.
For example, if a CAS has Normalized median of \(1.2\), then its median is \(1.2\) as large as the median leaf size of the optimal.
System |
Mean time (sec) |
Mean size |
Normalized mean |
Median size |
Normalized median |
Rubi |
0.18 |
199.75 |
0.77 |
149.00 |
1.00 |
Mathematica |
0.96 |
373.96 |
1.35 |
144.00 |
0.93 |
Maple |
1.45 | 5100.87 | 14.40 | 299.00 | 2.37 |
Maxima | 0.28 | 613.99 | 2.19 | 369.00 | 2.37 |
Fricas |
1.18 |
307.70 |
1.38 |
148.50 |
1.48 |
Sympy |
7.75 |
350.24 |
1.99 |
150.00 |
2.48 |
Giac |
5.90 |
1350.18 |
6.44 |
188.00 |
1.59 |
Mupad |
3.88 |
408.51 |
1.64 |
160.00 |
1.54 |
The following are bar charts for the normalized leafsize and time used from the above table.
|
|
{19, 20, 21, 24, 25, 26, 47, 48, 49, 52, 53, 54, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 109, 110, 111, 114, 115, 116, 137, 138, 139, 142, 143, 144, 191, 192, 193, 196, 197, 198, 219, 220, 221, 224, 225, 226, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292}
Rubi
{}
Mathematica
{}
Maple
{}
Maxima
{}
Fricas
{}
Sympy
{}
Giac
{}
Mupad
{}
The following are integrals solved by CAS but the verification phase failed to verify the anti-derivative produced is correct. This does not mean necessarily that the anti-derivative is wrong, as additional methods of verification might be needed, or more time is needed (3 minutes time limit was used). These integrals are listed here to make it easier to do further investigation to determine why it was not possible to verify the result produced.
Rubi
{}
Mathematica
{}
Maple
Verification phase not implemented yet.
Maxima
Verification phase not implemented yet.
Fricas
Verification phase not implemented yet.
Sympy
Verification phase not implemented yet.
Giac
Verification phase not implemented yet.
Mupad
Verification phase not implemented yet.
The command AbsoluteTiming[] was used in Mathematica to obtain the elapsed time for each integrate call. In Maple, the command Usage was used as in the following example
cpu_time := Usage(assign ('result_of_int',int(expr,x)),output='realtime'
For all other CAS systems, the elapsed time to complete each integral was found by taking the difference between the time after the call completed from the time before the call was made. This was done using Python’s time.time() call.
All elapsed times shown are in seconds. A time limit of 3 CPU minutes was used for each integral. If the integrate command did not complete within this time limit, the integral was aborted and considered to have failed and assigned an F grade. The time used by failed integrals due to time out was not counted in the final statistics.
A verification phase was applied on the result of integration for Rubi and Mathematica.
Future version of this report will implement verification for the other CAS systems. For the integrals whose result was not run through a verification phase, it is assumed that the antiderivative was correct.
Verification phase also had 3 minutes time out. An integral whose result was not verified could still be correct, but further investigation is needed on those integrals. These integrals were marked in the summary table below and also in each integral separate section so they are easy to identify and locate.
The following diagram gives a high level view of the current test build system.